19 research outputs found

    Irony Detection in Twitter with Imbalanced Class Distributions

    Full text link
    [EN] Irony detection is a not trivial problem and can help to improve natural language processing tasks as sentiment analysis. When dealing with social media data in real scenarios, an important issue to address is data skew, i.e. the imbalance between available ironic and non-ironic samples available. In this work, the main objective is to address irony detection in Twitter considering various degrees of imbalanced distribution between classes. We rely on the emotIDM irony detection model. We evaluated it against both benchmark corpora and skewed Twitter datasets collected to simulate a realistic distribution of ironic tweets. We carry out a set of classification experiments aimed to determine the impact of class imbalance on detecting irony, and we evaluate the performance of irony detection when different scenarios are considered. We experiment with a set of classifiers applying class imbalance techniques to compensate class distribution. Our results indicate that by using such techniques, it is possible to improve the performance of irony detection in imbalanced class scenarios.The first author was funded by CONACYT project FC-2016/2410. Ronaldo Prati was supported by the São Paulo State (Brazil) research council FAPESP under project 2015/20606-6. Francisco Herrera was partially supported by the Spanish National Research Project TIN2017-89517-P. The work of Paolo Rosso was partially supported by the Spanish MICINN under the research project MISMIS (PGC2018-096212- B-C31) and by the Generalitat Valenciana under the grant PROMETEO/2019/121.Hernandez-Farias, DI.; Prati, R.; Herrera, F.; Rosso, P. (2020). Irony Detection in Twitter with Imbalanced Class Distributions. Journal of Intelligent & Fuzzy Systems. 39(2):2147-2163. https://doi.org/10.3233/JIFS-179880S21472163392Batista, G. E. A. P. A., Prati, R. C., & Monard, M. C. (2004). A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter, 6(1), 20-29. doi:10.1145/1007730.1007735Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321-357. doi:10.1613/jair.953Fernández A. , García S. , Galar M. , Prati R.C. , Krawczyk B. and Herrera F. , Learning from imbalanced data sets, Springer, (2018).Haibo He, & Garcia, E. A. (2009). Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263-1284. doi:10.1109/tkde.2008.239Farías, D. I. H., Patti, V., & Rosso, P. (2016). Irony Detection in Twitter. ACM Transactions on Internet Technology, 16(3), 1-24. doi:10.1145/2930663Japkowicz, N., & Stephen, S. (2002). The class imbalance problem: A systematic study1. Intelligent Data Analysis, 6(5), 429-449. doi:10.3233/ida-2002-6504Kumon-Nakamura, S., Glucksberg, S., & Brown, M. (1995). How about another piece of pie: The allusional pretense theory of discourse irony. Journal of Experimental Psychology: General, 124(1), 3-21. doi:10.1037/0096-3445.124.1.3López, V., Fernández, A., García, S., Palade, V., & Herrera, F. (2013). An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Information Sciences, 250, 113-141. doi:10.1016/j.ins.2013.07.007Mohammad, S. M., & Turney, P. D. (2012). CROWDSOURCING A WORD-EMOTION ASSOCIATION LEXICON. Computational Intelligence, 29(3), 436-465. doi:10.1111/j.1467-8640.2012.00460.xMohammad, S. M., Zhu, X., Kiritchenko, S., & Martin, J. (2015). Sentiment, emotion, purpose, and style in electoral tweets. Information Processing & Management, 51(4), 480-499. doi:10.1016/j.ipm.2014.09.003Poria, S., Gelbukh, A., Hussain, A., Howard, N., Das, D., & Bandyopadhyay, S. (2013). Enhanced SenticNet with Affective Labels for Concept-Based Opinion Mining. IEEE Intelligent Systems, 28(2), 31-38. doi:10.1109/mis.2013.4Prati, R. C., Batista, G. E. A. P. A., & Silva, D. F. (2014). Class imbalance revisited: a new experimental setup to assess the performance of treatment methods. Knowledge and Information Systems, 45(1), 247-270. doi:10.1007/s10115-014-0794-3Reyes, A., Rosso, P., & Veale, T. (2012). A multidimensional approach for detecting irony in Twitter. Language Resources and Evaluation, 47(1), 239-268. doi:10.1007/s10579-012-9196-xSulis, E., Irazú Hernández Farías, D., Rosso, P., Patti, V., & Ruffo, G. (2016). Figurative messages and affect in Twitter: Differences between #irony, #sarcasm and #not. Knowledge-Based Systems, 108, 132-143. doi:10.1016/j.knosys.2016.05.035Utsumi, A. (2000). Verbal irony as implicit display of ironic environment: Distinguishing ironic utterances from nonirony. Journal of Pragmatics, 32(12), 1777-1806. doi:10.1016/s0378-2166(99)00116-2Whissell, C. (2009). Using the Revised Dictionary of Affect in Language to Quantify the Emotional Undertones of Samples of Natural Language. Psychological Reports, 105(2), 509-521. doi:10.2466/pr0.105.2.509-521Wilson, D., & Sperber, D. (1992). On verbal irony. Lingua, 87(1-2), 53-76. doi:10.1016/0024-3841(92)90025-

    Figurative Messages and Affect in Twitter: Differences Between #irony, #sarcasm and #not

    Full text link
    This is the author’s version of a work that was accepted for publication in Knowledge-Based Systems. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Knowledge-Based Systems 108 (2016) 132–143. DOI 10.1016/j.knosys.2016.05.035.The use of irony and sarcasm has been proven to be a pervasive phenomenon in social media posing a challenge to sentiment analysis systems. Such devices, in fact, can influence and twist the polarity of an utterance in different ways. A new dataset of over 10,000 tweets including a high variety of figurative language types, manually annotated with sentiment scores, has been released in the context of the task 11 of SemEval-2015. In this paper, we propose an analysis of the tweets in the dataset to investigate the open research issue of how separated figurative linguistic phenomena irony and sarcasm are, with a special focus on the role of features related to the multi-faceted affective information expressed in such texts. We considered for our analysis tweets tagged with #irony and #sarcasm, and also the tag #not, which has not been studied in depth before. A distribution and correlation analysis over a set of features, including a wide variety of psycholinguistic and emotional features, suggests arguments for the separation between irony and sarcasm. The outcome is a novel set of sentiment, structural and psycholinguistic features evaluated in binary classification experiments. We report about classification experiments carried out on a previously used corpus for #irony vs #sarcasm. We outperform in terms of F-measure the stateof-the-art results on this dataset. Overall, our results confirm the difficulty of the task, but introduce new data-driven arguments for the separation between #irony and #sarcasm. Interestingly, #not emerges as a distinct phenomenon. © 2016 Elsevier B.V. All rights reserved.The National Council for Science and Technology (CONACyT Mexico) has funded the research work of Delia Irazu Hernandez Farias (Grant No. 218109/313683 CVU-369616). Paolo Rosso has been partially funded by SomEMBED MINECO research project (TIN2015-71147-C2-1-P) and by the Generalitat Valenciana under the grant ALMAMATER (PrometeoII/2014/030). The work of Viviana Patti was partially carried out at the Universitat Politecnica de Valencia within the framework of a fellowship of the University of Turin co-funded by Fondazione CRT (WWS Program 2).Sulis, E.; Hernandez-Farias, DI.; Rosso, P.; Patti, V.; Ruffo, G. (2016). Figurative Messages and Affect in Twitter: Differences Between #irony, #sarcasm and #not. Knowledge-Based Systems. 108:132-143. https://doi.org/10.1016/j.knosys.2016.05.035S13214310

    A Knowledge-Based Weighted KNN for Detecting Irony in Twitter

    Full text link
    [EN] In this work, we propose a variant of a well-known instancebased algorithm: WKNN. Our idea is to exploit task-dependent features in order to calculate the weight of the instances according to a novel paradigm: the Textual Attraction Force, that serves to quantify the degree of relatedness between documents. The proposed method was applied to a challenging text classification task: irony detection. We experimented with corpora in the state of the art. The obtained results show that despite being a simple approach, our method is competitive with respect to more advanced techniques.This research was funded by CONACYT project FC 2016-2410. The work of P. Rosso has been funded by the SomEMBED TIN2015-71147-C2-1-P MINECO research project. The work of V. Patti was partially funded by Progetto di Ateneo/CSP 2016 (IhatePrejudice, S1618_L2_BOSC_01).Hernandez-Farias, DI.; Montes Gomez, M.; Escalante, H.; Rosso, P.; Patti, V. (2018). A Knowledge-Based Weighted KNN for Detecting Irony in Twitter. Lecture Notes in Computer Science. 11289:1-13. https://doi.org/10.1007/978-3-030-04497-8_16S11311289Barbieri, F., Basile, V., Croce, D., Nissim, M., Novielli, N., Patti, V.: Overview of the Evalita 2016 sentiment polarity classification task. In: Proceedings of Third Italian Conference on Computational Linguistics, vol. 1749. CEUR-WS.org (2016)Basile, V., Bolioli, A., Nissim, M., Patti, V., Rosso, P.: Overview of the Evalita 2014 sentiment polarity classification task. In: Proceedings of the First Italian Conference on Computational Linguistics, pp. 50–57 (2014)Brysbaert, M., Warriner, A.B., Kuperman, V.: Concreteness ratings for 40 thousand generally known English word lemmas. Behav. Res. Met. 46(3), 904–911 (2014)Cambria, E., Hussain, A.: Sentic Computing, vol. 1. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23654-4Cambria, E., Olsher, D., Rajagopal, D.: SenticNet 3: a common and common-sense knowledge base for cognition-driven sentiment analysis. In: Proceedings of AAAI Conference on Artificial Intelligence, pp. 1515–1521 (2014)Dudani, S.A.: The distance-weighted k-nearest-neighbor rule. IEEE Trans. Syst., Man, Cybern. SMC 6(4), 325–327 (1976)Ghosh, A., et al.: SemEval-2015 task 11: sentiment analysis of figurative language in Twitter. In: Proceedings of the 9th International Workshop on Semantic Evaluation, pp. 470–478 (2015)Giora, R., Fein, O.: Irony: context and salience. Metaphor. Symb. 14(4), 241–257 (1999)Gou, J., Du, L., Zhang, Y., Xiong, T.: A new distance-weighted k-nearest neighbor classifier. J. Inform. Comp. Sci. 9(6), 1429–1436 (2012)Grice, H.P.: Logic and conversation. In: Cole, P., Morgan, J.L. (eds.) Syntax and Semantics: Volume 3: Speech Acts, pp. 41–58. Academic Press, San Diego (1975)Hernández Farías, D.I., Patti, V., Rosso, P.: Irony detection in Twitter: the role of affective content. ACM Trans. Internet Technol. 16(3), 19:1–19:24 (2016)Hernández Farías, D.I., Rosso, P.: Irony, sarcasm, and sentiment analysis. chapter 7. In: Pozzi, F.A., Fersini, E., Messina, E., Liu, B. (eds.) Sentiment Analysis in Social Networks, pp. 113–127. Morgan Kaufmann (2016)Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the 10th SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168–177 (2004)Joshi, A., Bhattacharyya, P., Carman, M.J.: Automatic sarcasm detection: a survey. ACM Comput. Surv. 50(5), 73:1–73:22 (2017)Mitchell, T.M.: Machine learning and data min. Com. ACM 42(11), 30–36 (1999)Mohammad, S.M., Turney, P.D.: Crowdsourcing a word-emotion association lexicon. Comput. Intell. 29(3), 436–465 (2013)Mohammad, S.M., Zhu, X., Kiritchenko, S., Martin, J.: Sentiment, emotion, purpose, and style in electoral tweets. Inf. Process. Manag. 51(4), 480–499 (2015)Plutchik, R.: The nature of emotions. Am. Sci. 89(4), 344–350 (2001)Reyes, A., Rosso, P., Veale, T.: A multidimensional approach for detecting irony in Twitter. Lang. Resour. Eval. 47(1), 239–268 (2013)Riloff, E., Qadir, A., Surve, P., Silva, L.D., Gilbert, N., Huang, R.: Sarcasm as contrast between a positive sentiment and negative situation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 704–714. ACL (2013)Skalicky, S., Crossley, S.: A statistical analysis of satirical Amazon.com product reviews. Eur. J. Humour Res. 2, 66–85 (2015)Van Hee, C., Lefever, E., Hoste, V.: SemEval-2018 task 3: irony detection in English tweets. In: Proceedings of the 12th International Workshop on Semantic Evaluation, SemEval-2018. ACL, June 201

    ValenTo: Sentiment Analysis of Figurative Language Tweets with Irony and Sarcasm

    Get PDF
    This paper describes the system used by the ValenTo team in the Task 11, Sentiment Analysis of Figurative Language in Twitter, at SemEval 2015. Our system used a regression model and additional external resources to assign polarity values. A distinctive feature of our approach is that we used not only word-sentiment lexicons providing polarity annotations, but also novel resources for dealing with emotions and psycholinguistic information. These are important aspects to tackle in figurative language such as irony and sarcasm, which were represented in the dataset. The system also exploited novel and standard structural features of tweets. Considering the different kinds of figurative language in the dataset our submission obtained good results in recognizing sentiment polarity in both ironic and sarcastic tweets
    corecore